Method of constrained layer-wise video coding

ABSTRACT

Method and apparatus of video coding using a multi-layer prediction mode, are disclosed. According to one method, a bitstream is generated at an encoder side or received at a decoder side, where the bitstream corresponds to coded data of current video data in a current layer. The bitstream complies with a bitstream conformance requirement corresponding both the bit depth values for the current layer and the reference layer being the same and the chroma format index values for the current layer and the reference layer being the same. The current video data in the current layer is then encoded or decoded by utilizing reference video data in the reference layer.

CROSS REFERENCE TO RELATED APPLICATIONS

The present invention is a Divisional of pending U.S. patent applicationSer. No. 17/121,927, filed on Dec. 15, 2020, which claims priority toU.S. Provisional Patent Application, Ser. No. 62/948,971, filed on Dec.17, 2019 and U.S. Provisional Patent Application, Ser. No. 62/954,019,filed on Dec. 27, 2019. The U.S. Provisional Patent Applications arehereby incorporated by reference in their entireties.

FIELD OF THE INVENTION

The present invention relates to Layer-Wise Video Coding. In particular,the present invention relates to constraining parameters for layer-wisevideo coding to ensure proper motion compensation process for colorvideo.

BACKGROUND AND RELATED ART

In VVC Draft 7 (B. Bross, et al., “Versatile Video Coding (Draft 7)”,Joint Video Experts Team (WET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC29/WG 11, 16th Meeting: Geneva, CH, 1-11 Oct. 2019, Document:JVET-P2001), the layer-wise coding is supported. The layer structure isdefined in video parameter set (VPS) as shown in Table 1.

The vps_max_layers_minus1 specifies the number of video layers in thisVPS structure. The syntaxes, vps_all_independent_layers_flag,vps_independent_layer_flag[i], and vps_direct_ref_layer_flag[i][j]specify the inter-layer data reference dependency.

TABLE 1 Video Parameter Set (VPS) in VVC to support the layer Descriptorvideo_parameter_set_rbsp( ) {  vps_video_parameter_set_id u(4) vps_max_layers_minus1 u(6)  vps_max_sublayers_minus1 u(3)  if(vps_max_layers_minus1 > 0 && vps_max_sublayers_minus1 > 0 )  vps_all_layers_same_num_sublayers_flag u(1)  if(vps_max_layers_minus1 > 0 )   vps_all_independent_layers_flag u(1)  for(i = 0; i <= vps_max_layers_minus1; i++ ) {   vps_layer_id[ i ] u(6)  if( i > 0 && !vps_all_independent_layers_flag ) {   vps_independent_layer_flag[ i ] u(1)    if(!vps_independent_layer_flag[ i ] )     for( j = 0; j < i; j++ )     vps_direct_ref_layer_flag[ i ][ j ] u(1)   }  }  if(vps_max_layers_minus1 > 0 ) {   if( vps_all_independent_layers_flag )   each_layer_is_an_ols_flag u(1)   if( !each_layer_is_an_ols_flag ) {   if( !vps_all_independent_layers_flag )     ols_mode_idc u(2)    if(ols_mode_idc = = 2 ) {     num_output_layer_sets_minus1 u(8)     for( i= 1; i <= num_output_layer_sets_minus1; i ++)      for( j = 0; j <=vps_max_layers_minus1; j++ )       ols_output_layer_flag[ i ][ j ] u(1)   }   }  }  vps_num_ptls u(8)  for( i = 0; i < vps_num_ptls; i++ ) {  if( i > 0 )    pt_present_flag[ i ] u(1)   if(vps_max_sublayers_minus1 > 0 && !vps_all_layers_same_num_sublayers_flag)    ptl_max_temporal_id[ i ] u(3)  }  while( !byte_aligned( ) )  vps_ptl_byte_alignment_zero_bit /* equal to 0 */ u(1)  for( i = 0; i <vps_num_ptls; i++ )   profile_tier_level( pt_present_flag[ i ],ptl_max_temporal_id[ i ] )  for( i = 0; i < TotalNumOlss; i++ )   if(NumLayersInOls[ i ] > 1 && vps_num_ptls > 1 )    ols_ptl_idx[ i ] u(8) if( !vps_all_independent_layers_flag )   vps_num_dpb_params ue(v)  if(vps_num_dpb_params > 0 ) {   same_dpb_size_output_or_nonoutput_flag u(1)  if( vps_max_sublayers_minus1 > 0 )   vps_sublayer_dpb_params_present_flag u(1)  }  for( i = 0; i <vps_num_dpb_params; i++ ) {   dpb_size_only_flag[ i ] u(1)   if(vps_max_sublayers_minus1 > 0 && !vps_all_layers_same_num_sublayers_flag)    dpb_max_temporal_id[ i ] u(3)   dpb_parameters( dpb_size_only_flag[i ], dpb_max_temporal_id[ i ],      vps_sublayer_dpb_params_present_flag )  }  for( i = 0; i <vps_max_layers_minus1 && vps_num_dpb_params > 1; i++ ) {   if(!vps_independent_layer_flag[ i ] )    layer_output_dpb_params_idx[ i ]ue(v)   if( LayerUsedAsRefLayerFlag[ i ] &&!same_dpb_size_output_or_nonoutput_flag )   layer_nonoutput_dpb_params_idx[ i ] ue(v)  } vps_general_hrd_params_present_flag u(1)  if(vps_general_hrd_params_present_flag ) {   general_hrd_parameters( )  if( vps_max_sublayers_minus1 > 0 )   vps_sublayer_cpb_params_present_flag u(1)   if( TotalNumOlss > 1 )   num_ols_hrd_params_minus1 ue(v)   for( i = 0; i <=num_ols_hrd_params_minus1; i++ ) {    if( vps_max_sublayers_minus1 > 0&& !vps_all_layers_same_num_sublayers_flag )     hrd_max_tid[ i ] u(3)   firstSubLayer = vps_sublayer_cpb_params_present_flag ? 0 :hrd_max_tid[ i ]    ols_hrd_parameters( firstSubLayer,hrd_max_temporal_id[ i ] )   }   if( num_ols_hrd_params_minus1 > 0 )   for( i = 1; i < TotalNumOlss; i++ )     ols_hrd_idx[ i ] ue(v)  } vps_extension_flag u(1)  if( vps_extension_flag )   while(more_rbsp_data( ) )    vps_extension_data_flag u(1)  rbsp_trailing_bits() }

In each layer, one or more sequence parameter sets (SPS) are signalled.The SPS contains lots of video information, such as maximum picturewidth/height, chroma format, CTU size, etc. The inter-layer predictioncan be supported in VVC. Between two different layers, the lower layerreconstructed pictures can be used as the reference pictures of thehigher layer. If the picture sizes are different, the reference pictureresampling is used to generate the prediction blocks. Therefore, twolayers with different picture sizes are not the problem for layer-wisereferencing in VVC. However, if the chroma formats of two layers aredifferent, it might have problems for inter-layer prediction. Forexample, if the lower layer is a monochrome coded layer and the higherlayer is in 420 chroma format, the chroma component predictor cannot begenerated by using the inter-layer prediction.

BRIEF SUMMARY OF THE INVENTION

Method and apparatus of video coding using a multi-layer predictionmode, are disclosed. According to one method, a bitstream is generatedat an encoder side or received at a decoder side, where the bitstreamcorresponds to coded data of current video data in a current layer. Thebitstream complies with a bitstream conformance requirementcorresponding to both the bit depth values for the current layer and thereference layer being the same and the chroma format index values forthe current layer and the reference layer being the same. The currentvideo data in the current layer is then encoded or decoded by utilizingreference video data in the reference layer.

In one embodiment, the bit depth values for the current layer and thereference layer are derived from bit_depth_minus8 syntax elements in thebitstream, where the bit_depth_minus8 syntax elements correspond to thebit depth values minus 8 for the current layer and the reference layerrespectively.

In one embodiment, the chroma format index values for the current layerand the reference layer are determined according to chroma_format_idcsyntax elements in the bitstream, where the chroma_format_idc syntaxelements specify chroma sampling relative to luma sampling for thecurrent layer and the reference layer respectively.

According to another method, a bitstream is generated at an encoder sideor received at a decoder side, where the bitstream corresponds to codeddata of current video data in a current layer. The bitstream complieswith a bitstream conformance requirement comprising a first condition, asecond condition, or both the first condition and the second condition.The first condition corresponds to that first bit-depth for the currentlayer is greater than or equal to second bit-depth for a referencelayer. The second condition corresponds to that first chroma formatindex for the current layer is greater than or equal to second chromaformat index for the reference layer, and the first chroma format indexand the second chroma format index specify chroma sampling relative toluma sampling for the current layer and the reference layerrespectively. A larger chroma format index value indicates a highersampling density. The current video data in the current layer is thenencoded or decoded by utilizing reference video data in the referencelayer along with the first bit-depth and the second bit-depth, the firstchroma format index and the second chroma format index, or both thefirst bit-depth and the second bit-depth and the first chroma formatindex and the second chroma format index.

According to yet another method, input data are received, where theinput data correspond to video data in a current layer at a videoencoder side or the input data correspond to coded video data in thecurrent layer at a video decoder side. Motion compensation is applied tothe video data in the current layer at the encoder side or to the codedvideo data in the current layer at the video decoder side by utilizingreference video data in a reference layer. The motion compensationutilizes information comprising chroma formats or one or more variablesrelated to the chroma format for both the current layer and thereference layer, bit-depth values for both the current layer and thereference layer, or both the chroma formats or said one or morevariables related to the chroma format for both the current layer andthe reference layer and the bit-depth values for both the current layerand the reference layer.

In one embodiment, the chroma formats or said one or more variablescomprise a first variable corresponding to horizontal chromasub-sampling factor, a second variable corresponding to vertical chromasub-sampling factor, or both the first variable and the second variable.

In another embodiment, the bit-depth values are derived from syntaxelements signaled in a bitstream or parsed from the bitstream, wheresyntax elements correspond to the bit-depth values minus 8 for thecurrent layer and the reference layer respectively.

In another embodiment, the reference sample position of the referencelayer is calculated in the motion compensation process by utilizing thehorizontal chroma sub-sampling factor, the vertical chroma sub-samplingfactor, or both.

In another embodiment, the reference sample position of the referencelayer is calculated in the motion compensation process by utilizing theratio of the horizontal chroma sub-sampling factor of current layer tothe horizontal chroma sub-sampling factor of reference layer, the ratioof the vertical chroma sub-sampling factor of current layer to thevertical chroma sub-sampling factor of reference layer, or both.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an exemplary block diagram of a system incorporatingconstrained layer-wise video coding according to an embodiment of thepresent invention, where both the bit depth values and the chroma formatindex values for the current layer and the reference layer are the same.

FIG. 2 illustrates an exemplary block diagram of a system incorporatingconstrained layer-wise video coding according to an embodiment of thepresent invention, where the bit-depth is for the current layer isgreater than or equal to the bit-depth for a reference layer, or thechroma format index for the current layer is greater than or equal tothe chroma format index for the reference layer.

FIG. 3 illustrates an exemplary block diagram of a system incorporatingconstrained layer-wise video coding according to an embodiment of thepresent invention, where the motion compensation utilizes informationcomprising chroma formats or one or more variables related to the chromaformat for both the current layer and the reference layer, or bit-depthvalues for both the current layer and the reference layer.

DETAILED DESCRIPTION OF THE INVENTION

The following description is of the best-contemplated mode of carryingout the invention. This description is made for the purpose ofillustrating the general principles of the invention and should not betaken in a limiting sense. The scope of the invention is best determinedby reference to the appended claims.

It will be readily understood that the components of the presentinvention, as generally described and illustrated in the figures herein,may be arranged and designed in a wide variety of differentconfigurations. Thus, the following more detailed description of theembodiments of the systems and methods of the present invention, asrepresented in the figures, is not intended to limit the scope of theinvention, as claimed, but is merely representative of selectedembodiments of the invention.

Reference throughout this specification to “one embodiment,” “anembodiment,” or similar language means that a particular feature,structure, or characteristic described in connection with the embodimentmay be included in at least one embodiment of the present invention.Thus, appearances of the phrases “in one embodiment” or “in anembodiment” in various places throughout this specification are notnecessarily all referring to the same embodiment.

Furthermore, the described features, structures, or characteristics maybe combined in any suitable manner in one or more embodiments. Oneskilled in the relevant art will recognize, however, that the inventioncan be practiced without one or more of the specific details, or withother methods, components, etc. In other instances, well-knownstructures, or operations are not shown or described in detail to avoidobscuring aspects of the invention.

The illustrated embodiments of the invention will be best understood byreference to the drawings, wherein like parts are designated by likenumerals throughout. The following description is intended only by wayof example, and simply illustrates certain selected embodiments ofapparatus and methods that are consistent with the invention as claimedherein.

In the description like reference numbers appearing in the drawings anddescription designate corresponding or like elements among the differentviews.

Method-1: Constraint of the Chroma Format

In this invention, the chroma format of different layers shall beconstrained. In one embodiment, the chroma format of two or more layersthat have dependency (e.g. one being referenced by another) shall be thesame. The chroma format of the higher layer shall be the same as that ofthe lower reference layers. For example, it is a requirement ofbitstream conformance that the value of chroma_format_idc of the currentlayer shall be the same as the value of chroma_format_idc of thereference layers. As is known in the VVC draft standard, chroma formatindex specifies chroma sampling relative to luma sampling.chroma_format_idc with a value of 0, 1, 2 and 3 corresponds to chromaformat of monochrome, 4:2:0, 4; 2:2, and 4:4;4 respectively. The 4:2:0(also, referred as 420) format corresponds to horizontal and vertical2:1 sub-sampling. The 4:2:2 (also, referred as 422) format correspondsto horizontal 2:1 sub-sampling. The 4:4:4 (also, referred as 444) formatcorresponds no sub-sampling in the horizontal or vertical direction.Therefore, a larger chroma format index value indicates a highersub-sampling density. In another example, it is a requirement ofbitstream conformance that the value of chroma_format_idc of the currentpicture shall be the same as the value of chroma_format_idc of thereference pictures. In another embodiment, the values of bit-depthrelated syntax elements (e.g. bit_depth_minus8, bit_depth_luma_minus8,and/or bit_depth_chroma_minus8) of the current layer are alsoconstrained. For example, it is a requirement of bitstream conformancethat the value of bit-depth related syntax elements (e.g.bit_depth_minus8, bit_depth_luma_minus8, and/or bit_depth_chroma_minus8)of the current layer shall be the same as that of the reference layers.

In another embodiment, the values of separate color plane flag (e.g.separate_color_plane_flag) of the current layer are also constrained.For example, it is a requirement of bitstream conformance that the valueof separate color plane flag (e.g. separate_color_plane_flag) of thecurrent layer shall be the same as that of the reference layers. Inanother embodiment, the values of chroma phase flags (e.g.sps_chroma_horizontal_collocated_flag andsps_chroma_vertical_collocated_flag) of the current layer are alsoconstrained. For example, it is a requirement of bitstream conformancethat the value of chroma phase flags (e.g.sps_chroma_horizontal_collocated_flag andsps_chroma_vertical_collocated_flag) of the current layer shall be thesame as that of the reference layers.

In another embodiment, the value of chroma format index (e.g.chroma_format_idc) of the higher layer shall be greater than or equal tothe lower layer (e.g. the reference layer). For example, it is arequirement of bitstream conformance that the value of chroma_format_idcof the current layer shall be greater than or equal to the value ofchroma_format_idc of the reference layers. In another example, it is arequirement of bitstream conformance that the value of chroma_format_idcof the current picture shall be greater than or equal to the value ofchroma_format_idc of the reference pictures. Since the chroma_format_idcvalue of the current layer/picture is greater than that of the referencelayer/picture, the number of chroma samples of the referencelayer/picture is less than that of the current layer/picture, the chromamotion compensation will need to perform the interpolation in thesubsampled domain. For example, if the chroma format of the currentlayer/picture is 4:4:4 and the chroma format of the referencelayer/picture is 4:2:0, the chroma picture size of the referencelayer/picture is treated as a half size of the current picture in widthand height.

In one embodiment, the scaling ratios of luma sample and chroma sampleare derived separately. The scaling window offsets (e.g.scaling_win_left_offset and scaling_win_top_offset) of luma sample andchroma sample are also derived separately. If the reference layer hasless color component (e.g. monochroma), a predefined, derived, orsignaled value is assigned to the predictors of missing colorcomponents. For example, the value of (1<<(bit_depth−1)) can be used asthe predictors of the missing color components.

In another embodiment, the value of chroma format index (e.g.chroma_format_idc) of the higher layer shall be less than or equal tothe lower layer (e.g. the reference layer). For example, it is arequirement of bitstream conformance that the value of chroma_format_idcof the current layer shall be less than or equal to the value ofchroma_format_idc of the reference layers. In another example, it is arequirement of bitstream conformance that the value of chroma_format_idcof the current picture shall be less than or equal to the value ofchroma_format_idc of the reference pictures. Since the chroma_format_idcvalue of the current layer/picture is less than that of the referencelayer/picture, the number of chroma samples of the referencelayer/picture is less than that of the current layer/picture. Therefore,the chroma motion compensation will need to perform the interpolation inthe upsampled domain. For example, if the chroma format of the currentlayer/picture is 4:2:0 and the chroma format of the referencelayer/picture is 4:4:4, the chroma picture size of the referencelayer/picture is treated as a double size of the current picture inwidth and height.

In one embodiment, the scaling ratios of luma sample and chroma sampleare derived separately. The scaling window offsets (e.g.scaling_win_left_offset and scaling_win_top_offset) of luma sample andchroma sample are also derived separately. In another embodiment, whenthe reference layer/picture has a greater value of chroma_format_idcthan that of the current layer/picture, a chroma sample subsamplingprocess is applied in advance or on-the-fly on the referencelayer/picture to match with the chroma format of the currentlayer/picture.

Method-2: Inferred Value of Chroma_Format_Idc for Higher Layer

In this invention, one or more inter-layer reference/prediction syntaxelements are signalled in the SPS or the PPS, such as the inter layerref_pics_present flag. If the syntax element indicates that theinter-layer referencing/prediction is used, one or more syntax elementsare skipped and the values of the syntax elements are inferred. Forexample, the value of chroma_format_idc is inferred as the same value ofchroma_format_idc of the reference layer. In another example, the valuesof bit-depth related syntax elements (e.g. bit_depth_minus8,bit_depth_luma_minus8, and/or bit_depth_chroma_minus8) of the currentlayer are also inferred as the same values of those of the referencelayer if the inter-layer reference/prediction is used. In anotherexample, the value of separate color plane flag (e.g.separate_color_plane_flag) of the current layer is also inferred as thesame value of the reference layer if the inter-layerreferencing/prediction is used. In another example, the values of chromaphase flags (e.g. sps_chroma_horizontal_collocated_flag andsps_chroma_vertical_collocated_flag) of the current layer are alsoinferred as the same values of the reference layer if the inter-layerreferencing/prediction is used.

The syntax elements of chroma_format_idc, bit_depth_minus8,bit_depth_luma_minus8, bit_depth_chroma_minus8,separate_colour_plane_flag, sps_chroma_horizontal_collocated_flag, andsps_chroma_vertical_collocated_flag are moved to the place after theinter-layer referencing/prediction syntax elements accordingly in theSPS or the PPS wherever is relevant.

Method-3: Considering Chroma Format in Motion Compensation

In this invention, the chroma formats of the current layer/picture andthe reference layer/picture are taken into consideration in the motioncompensation process. When doing motion compensation, considering thechroma format or the variable related to the chroma format (e.g.SubWidthC and RefSubWidthC) of the current layer and the referencelayer. The SubWidthC and SubHeightC values derived fromchroma_format_idc and separate_colour_plane_flag are shown in Table 2.For example, when doing the motion compensation, a reference sampleposition of the reference layer is calculated in the motion compensationprocess by utilizing the SubWidthC, the SubHeightC, or both. In anotherexample, when doing the motion compensation, a reference sample positionof the reference layer is calculated in the motion compensation processby utilizing the ratio of the SubWidthC of current layer to theSubWidthC of reference layer, the ratio of the SubHeightC of currentlayer to the SubHeightC of reference layer, or both.

TABLE 2 SubWidthC and SubHeightC values derived from chroma_format_idcand separate_colour_plane_flag chroma_format_idcseparate_colour_plane_flag Chroma format SubWidthC SubHeightC 0 0Monochrome 1 1 1 0 4:2:0 2 2 2 0 4:2:2 2 1 3 0 4:4:4 1 1 3 1 4:4:4 1 1

In the following, we illustrate part of the decoding process of chromamotion compensation in VVC. The scalingRatio [0] and scalingRatio[1] arethe horizontal and vertical scaling ratio derived from the luma scalingwindow. The SubWidthC and SubHeightC are the chroma subsample ratio(related to luma samples) in horizontal and vertical direction as shownin Table 2. The refMvCLX is the chroma MV.

addX=sps_chroma_horizontal_collocated_flag?0:8*(scalingRatio[0]−(1<<14))

addY=sps_chroma_vertical_collocated_flag?0:8*(scalingRatio[1]−(1<<14))

refxSb_(C)=(((xSb−scaling_win_left_offset)/SubWidthC<<5)+refMvCLX[0])*scalingRatio[0]+addX

refx _(C)=((Sign(refxSb _(C))*((Abs(refxSb_(C))+256)>>9)+xC*((scalingRatio[0]+8)>>4))+fRefLeftOffset/SubWidthC+16)>>5

refySb_(C)=(((ySb−scaling_win_top_offset)/SubHeightC<<5)+refMvCLX[1])*scalingRatio[1]+addY

refy _(C)=((Sign(refySb _(C))*((Abs(refySb_(C))+256)>>9)+yC*((scalingRatio[1]+8)>>4))+fRefTopOffset/SubHeightC+16)>>5

In the above equations, (refxSbC, refySbC) and (refxC, refyC) are chromalocations pointed to by a motion vector (refMvLX[0], refMvLX[1]) givenin 1/32-sample units. To support the motion compensation with differentchroma formats, the RefSubWidthC and RefSubHeightC are derived to beequal to the SubWidthC and SubHeightC of the reference layer/picture,respectively. The motion compensation process is modified as follow:

addX=sps_chroma_horizontal_collocated_flag?0:8*(scalingRatio[0]−(1<<14))

addY=sps_chroma_vertical_collocated_flag?0:8*(scalingRatio[1]−(1<<14))

refxSb_(C)=(((xSb−scaling_win_left_offset)/SubWidthC<<5)+refMvCLX[0])*scalingRatio[0]+addX(947)

refx _(C)=((Sign(refxSb _(C))refxSb _(C))*((Abs(refxSb _(C))refxSb_(C))+256)>>9)+xC*((scalingRatio[0]+8)>>4))*(SubWidthC/RefSubWidthC)+fRefLeftOffset/RefSubWidthC+16)>>5

refySb_(C)=(((ySb−scaling_win_top_offset)/SubHeightC<<5)+refMvCLX[1])*scalingRatio[1]+addY

refy _(C)=((Sign(refySb _(C))*((Abs(refySb_(C))+256)>>9)+yC*((scalingRatio[1]+8)>>4))*(SubHeightC/RefSubHeightC)+fRefTopOffset/RefSubHeightC+16)>>5

In another embodiment, the chroma phase of reference layer/picture isalso considered. Let the RefChromaHorCollocatedFlag andRefChromaVerCollocatedFlag be the sps_chroma_horizontal_collocated_flagand sps_chroma_vertical_collocated_flag of the reference layer/picture.

addXcur=(sps_chroma_horizontal_collocated_flag?0:16)*scalingRatio[0]/SubWidthC

addXref=(RefChromaHorCollocatedFlag?0:16)*(1<<14)/RefSubWidthC

addX=addXcur−addXref

addYcur=(sps_chroma_vertical_collocated_flag?0:16)*scalingRatio[1]/SubHeightC

addYref=(RefChromaVerCollocatedFlag?0:16)*(1<<14)/RefSubHeightC

addY=addYcur−addYref

refxSb_(C)=(((xSb−scaling_win_left_offset)/SubWidthC<<5)+refMvCLX[0])*scalingRatio[0]+addX

refx _(C)=((Sign(refxSb _(C))refxSb _(C))*((Abs(refxSb _(C))refxSb_(C))+256)>>9)+xC*((scalingRatio[0]+8)>>4))*(SubWidthC/RefSubWidthC)+fRefLeftOffset/RefSubWidthC+16)>>5

refySb_(C)=(((ySb−scaling_win_top_offset)/SubHeightC<<5)+refMvCLX[1])*scalingRatio[1]+addY

refy _(C)=((Sign(refySb _(C))*((Abs(refySb_(C))+256)>>9)+yC*((scalingRatio[1]+8)>>4))*(SubHeightC/RefSubHeightC)+fRefTopOffset/RefSubHeightC+16)>>5

In another embodiment, the luma MV, refMvLX[ ] is used. The (refxSb_(C),refySb_(C)) and (refx_(C), refy_(C)) are derived by using thecorresponding luma sample position and then converting it to the chromasample position in the reference layer/picture.

In this invention, if the reference layer has less color components(e.g. monochroma), a predefined, derived, or signaled value is assignedto the predictors of missing color components. For example, the value of(1<<(bit depth−1)) can be used as the predictors of the missing colorcomponents.

In another embodiment, if the bit-depth of two layers are different, thebit-depth truncation or bit-depth extension process is applied. If thelower layer has lower bit-depth, the bit-depth extension is applied. Thezero bits are inserted after the LSB until the bit-depth of two layersare the same. If the lower layer has higher bit-depth, the bit-depthtruncation is applied. The LSB is removed until the bit-depth of twolayers are the same.

Method-4: Bit Depth Constraint for Multi-Layer Structure

In this method, to support the bit depth scalability for multi-layerstructure, it is proposed to add a bitstream stream constraint that thebit depth of the higher layer shall be greater than or equal to the bitdepth of the lower layer. For example, it is a requirement of bitstreamconformance that the value of bit_depth_minus8 of the current layershall be greater than or equal to the value of bit_depth_minus8 of thereference layers of the current layer.

In this method, it is proposed to consider the bit-depth of the currentlayer and the bit-depth of the reference layer when performing motioncompensation.

Since the reference picture may have smaller bit depth than the currentpicture, the decoding process of the inter prediction need to bemodified. In one embodiment, the left shift and right shift parameter ininterpolation filtering process need to consider the bit depth of thereference picture and the current picture. For example, when directlyfetching the integer pixel without performing interpolation, the leftshift (shift3) needs to consider the bit depth difference betweencurrent picture and reference picture. The shift3 can be modified fromMax(2, 14−BitDepth) to Max(2+BitDepth−RefBitDepth, 14−RefBitDepth).Also, for gradient calculation in prediction refinement with opticalflow (PROF), the reference samples can be directly used. If the bitdepth of current picture and reference picture are different, the shiftvalue should be modified. For example, it can be modified from Max(2,14−BitDepth) to Max(2+BitDepth−RefBitDepth, 14−RefBitDepth). When doingthe interpolation, the input reference sample can be left shifted by aamount, e.g. shift4. The shift4 can be the bit depth difference betweencurrent picture and reference picture The scaling window sizes isproposed as follows and highlighted in Italic. In the following, thesection numbers (e.g., 8.5.6.3.2) are the section numbers in VVC Draft7.

8.5.6.3.2 Luma Sample Interpolation Filtering Process

. . .

Output of this process is a predicted luma sample value predSampleLX_(L)

The variables shift1, shift2 and shift3 are derived as follows:

-   -   Let the RefBitDepth be the BitDepth of the reference picture.        The variable shift1 is set equal to Min(4, BitDepth−8), the        variable shift2 is set equal to 6, the variable shift3 is set        equal to Max(2+BitDepth−RefBitDepth, 14−RefBitDepth) and the        variable shift4 is set equal to (BitDepth−RefBitDepth).    -   The variable picW is set equal to pic_width_in_luma_samples and        the variable picH is set equal to pic_height_in_luma_samples.

The predicted luma sample value predSampleLX_(L) is derived as follows:

-   -   If both xFrac_(L) and yFrac_(L) are equal to 0, and both        scalingRatio[0] and scalingRatio[1] are less than 20481, the        value of predSampleLX_(L) is derived as follows:

predSampleLX_(L)=refPicLX_(L)[xInt₃][yInt₃]<<shift3

-   -   Otherwise, if yFrac_(L) is equal to 0 and scalingRatio[1] is        less than 20481, the value of predSampleLX_(L) is derived as        follows:

predSampleLX_(L)=(Σ_(i=0) ⁷ f_(LH)[xFrac_(L)][i](refPicLX_(L)[xInt_(i)][yInt₃]<<shift4))>>shift1

-   -   Otherwise, if xFrac_(L) is equal to 0 and scalingRatio[1] is        less than 20481, the value of predSampleLX_(L) is derived as        follows:

predSampleLX_(L)=(Σ_(i=0) ⁷ f_(LV)[yFrac_(L)][i]*(refPicLX_(L)[xInt₃][yInt_(i)]<<shift4))>>shift1

-   -   Otherwise, the value of predSampleLX_(L) is derived as follows:        -   The sample array temp[n] with n=0 . . . 7, is derived as            follows:

temp[n]=E _(i=0) ⁷ f_(LH)[xFrac_(L)][i]*(refPicLX_(L)[xInt_(i)][yInt_(n)]<<shift4))>>shift1

-   -   -   The predicted luma sample value predSampleLX_(L) is derived            as follows:

predSampleLX_(L)=(Σ_(i=0) ⁷ f _(LV)[yFrac_(L)][i]*temp[i])>>shift2

8.5.6.3.3 Luma Sample Interpolation Filtering Process

Inputs to this process are:

-   -   a luma location in full-sample units (xInt_(L), yInt_(L)),    -   the luma reference sample array refPicLX_(L),

Output of this process is a predicted luma sample valuepredSampleLX_(L).

-   -   Let the RefBitDepth be the BitDepth of the reference picture.        The variable shift is set equal to Max(2+BitDepth−RefBitDepth,        14−RefBitDepth).    -   The variable picW is set equal to pic_width_in_luma_samples and        the variable picH is set equal to pic_height_in_luma_samples.

8.5.6.3.4 Chroma Sample Interpolation Process

Output of this process is a predicted chroma sample valuepredSampleLX_(C). The variables shift1, shift2 and shift3 are derived asfollows:

-   -   Let the RefBitDepth be the BitDepth of the reference picture.        The variable shift1 is set equal to Min(4, BitDepth−8), the        variable shift2 is set equal to 6, the variable shift3 is set        equal to Max(2+BitDepth RefBitDepth, 14−RefBitDepth) and the        variable shift4 is set equal to (BitDepth−RefBitDepth).    -   The variable picW_(C) is set equal to        pic_width_in_luma_samples/SubWidthC and the variable picH_(C) is        set equal to pic_height_in_luma_samples/SubHeightC.

. . .

The predicted chroma sample value predSampleLX_(C) is derived asfollows:

-   -   If both xFrac_(C) and yFrac_(C) are equal to 0, and both        scalingRatio[0] and scalingRatio[1] are less than 20481, the        value of predSampleLX_(C) is derived as follows:

predSampleLX_(C)=refPicLX_(C)[xInt₁][yInt₁]<<shift3

-   -   Otherwise, if yFrac_(C) is equal to 0 and scalingRatio[1] is        less than 20481, the value of predSampleLX_(C) is derived as        follows:

predSampleLX_(C)=(Σ_(i=0) ³ f_(CH)[xFrac_(C)][i]*(refPicLX_(C)[xInt_(i)][yInt₁]<<shift4))>>shift1

-   -   Otherwise, if xFrac_(C) is equal to 0 and scalingRatio[0] is        less than 20481, the value of predSampleLX_(C) is derived as        follows:

predSampleLX_(C)=(Σ_(i=0) ³ f_(CV)[yFrac_(C)][i]*(refPicLX_(C)[xInt₁][yInt₁]<<shift4))>>shift1

-   -   Otherwise, the value of predSampleLX_(C) is derived as follows:        -   The sample array temp[n] with n=0 . . . 3, is derived as            follows:

temp[n]=(Σ_(i=0) ³ f_(CH)[xFrac_(C)][i]*(refPicLX_(C)[xInt_(i)][yInt_(n)]<<shift4))>>shift1

-   -   -   The predicted chroma sample value predSampleLX_(C) is            derived as follows:

predSampleLX_(C)=(f _(CV)[yFrac_(C)][0]*temp[0]+f_(CV)[yFrac_(C)][1]*temp[1]+f _(CV)[yFrac_(C)][2]*temp[2]+f_(CV)[yFrac_(C)][3]*temp[3])>>shift2

In another embodiment, the left shift and right shift parameter ininterpolation filtering process need to consider the bit depth of thereference picture and the current picture. For example, when directlyfetching the integer pixel without performing the interpolation, theleft shift (shift3) needs to consider the bit depth difference betweenthe current picture and the reference picture. The shift3 can bemodified from Max(2, 14−BitDepth) to Max(2+BitDepth−RefBitDepth,14−RefBitDepth). Also, for gradient calculation in prediction refinementwith optical flow (PROF), the reference samples are directly used. Ifthe bit depth of current picture and reference picture are different,the shift value should be modified. For example, it can be modified fromMax(2, 14−BitDepth) to Max(2+BitDepth−RefBitDepth, 14−RefBitDepth). Whenperforming the interpolation, the input reference sample can be leftshifted by an amount, such as shift4. The shift4 can be the bit depthdifference between the current picture and the reference picture. Forthe right-shift in first stage interpolation filter (e.g. shift1), itcan consider the bit depth difference between the current picture andthe reference picture. The shift1 and shift4 can be compensated by eachother. Only one of them can be a non-zero positive integer. If one ofthem is a positive integer, the other is zero. Or, both shift1 andshift4 can be zero. The shift1 can be modified from Min(4, BitDepth−8)to Max(2+BitDepth−RefBitDepth, 14−RefBitDepth). The shift4 can bemodified as Max(0,Max(BitDepth−RefBitDepth−4, 8−RefBitDepth)). Anexample of the proposed text for the scaling window sizes based on VVCDraft 7 is as follows and highlighted in Italic.

8.5.6.3.2 Luma Sample Interpolation Filtering Process

Output of this process is a predicted luma sample valuepredSampleLX_(L). The variables shift1, shift2 and shift3 are derived asfollows:

-   -   Let the RefBitDepth be the BitDepth of the reference picture.        The variable shift1 is set equal to        Max(0,Min(RefBitDepth+4−BitDepth, RefBitDepth−8)), the variable        shift2 is set equal to 6, the variable shift3 is set equal to        Max(2+BitDepth−RefBitDepth, 14−RefBitDepth) and the variable        shift4 is set equal to Max(0,Max(BitDepth−RefBitDepth−4,        8−RefBitDepth)).    -   The variable picW is set equal to pic_width_in_luma_samples and        the variable picH is set equal to pic_height_in_luma_samples.

The predicted luma sample value predSampleLX_(L) is derived as follows:

-   -   If both xFrac_(L) and yFrac_(L) are equal to 0, and both        scalingRatio[0] and scalingRatio[1] are less than 20481, the        value of predSampleLX_(L) is derived as follows:

predSampleLX_(L)=refPicLX_(L)[xInt₃][yInt₃]<<shift3

-   -   Otherwise, if yFrac_(L) is equal to 0 and scalingRatio[1] is        less than 20481, the value of predSampleLX_(L) is derived as        follows:

predSampleLX_(L)=Σ_(i=0) ⁷ f_(LH)[xFrac_(L)][i]*(refPicLX_(L)[xInt_(i)][yInt₃]<<shift4))>>shift1

-   -   Otherwise, if xFrac_(L) is equal to 0 and scalingRatio[0] is        less than 20481, the value of predSampleLX_(L) is derived as        follows:

predSampleLX_(L)=(Σ_(i=0) ⁷ f_(LV)[yFrac_(L)][i]*(refPicLX_(L)[xInt₃][yInt_(i)]<<shift4))>>shift1

-   -   Otherwise, the value of predSampleLX_(L) is derived as follows:        -   The sample array temp[n] with n=0 . . . 7, is derived as            follows:

temp[n]=(Σ_(i=0) ⁷ f_(LH)[xFrac_(L)][i]*(refPicLX_(L)[xInt_(i)][yInt_(n)]<<shift4))>>shift1

-   -   -   The predicted luma sample value predSampleLX_(L) is derived            as follows:

predSampleLX_(L)=(Σ_(i=0) ⁷ f _(LV)[yFrac_(L)][i]*temp[i])>>shift2

8.5.6.3.3 Luma Sample Interpolation Filtering Process

Inputs to this process are:

-   -   a luma location in full-sample units (xInt_(L), yInt_(L)),    -   the luma reference sample array refPicLX_(L),

Output of this process is a predicted luma sample valuepredSampleLX_(L).

-   -   Let the RefBitDepth be the BitDepth of the reference picture.        The variable shift is set equal to Max(2+BitDepth−RefBitDepth,        14−RefBitDepth).

The variable picW is set equal to pic_width_in_luma_samples and thevariable picH is set equal to pic_height_in_luma_samples.

8.5.6.3.4 Chroma Sample Interpolation Process

Output of this process is a predicted chroma sample valuepredSampleLX_(C). The variables shift1, shift2 and shift3 are derived asfollows:

-   -   Let the RefBitDepth be the BitDepth of the reference picture.        The variable shift1 is set equal to        Max(0,Min(RefBitDepth+4−BitDepth, RefBitDepth−8)), the variable        shift2 is set equal to 6, the variable shift3 is set equal to        Max(2+BitDepth−RefBitDepth, 14−RefBitDepth) and the variable        shift4 is set equal to Max(0,Max(BitDepth−RefBitDepth−4,        8−RefBitDepth)).    -   The variable picW_(C) is set equal to        pic_width_in_luma_samples/SubWidthC and the variable picH_(C) is        set equal to pic_height_in_luma_samples/SubHeightC.

The predicted chroma sample value predSampleLX_(C) is derived asfollows:

-   -   If both xFrac_(C) and yFrac_(C) are equal to 0, and both        scalingRatio[0] and scalingRatio[1] are less than 20481, the        value of predSampleLX_(C) is derived as follows:

predSampleLX_(C)=refPicLX_(C)[xInt₁][yInt₁]<<shift3

-   -   Otherwise, if yFrac_(C) is equal to 0 and scalingRatio[1] is        less than 20481, the value of predSampleLX_(C) is derived as        follows:

predSampleLX_(C)=(Σ_(i=0) ³ f_(CH)[xFrac_(C)][i]*(refPicLX_(C)[xInt_(i)][yInt_(i)]shift4))>>shift1

-   -   Otherwise, if xFrac_(C) is equal to 0 and scalingRatio[0] is        less than 20481, the value of predSampleLX_(C) is derived as        follows:

predSampleLX_(C)=(Σ_(i=0) ³ f_(CV)[yFrac_(C)][i]*(refPicLX_(C)[xInt₁][yInt_(i)]<<shift4))>>shift1

-   -   Otherwise, the value of predSampleLX_(C) is derived as follows:    -   The sample array temp[n] with n=0 . . . 3, is derived as        follows:

temp[n]=(Σ_(i=0) ³ f_(CH)[xFrac_(C)][i]*(refPicLX_(C)[xInt_(i)][yInt_(n)]<<shift4))>>shift1

-   -   The predicted chroma sample value predSampleLX_(C) is derived as        follows:

predSampleLX_(C)=(f _(CV)[yFrac_(C)][0]*temp[0]+f_(CV)[yFrac_(C)][1]*temp[1]+f _(CV)[yFrac_(C)][2]*temp[2]f_(CV)[yFrac_(C)][3]*temp[3])>>shift2

Note that, if the bit depth of current picture (BitDepth) is smallerthan or equal to 12 bits, the shift4 is always equal to 0. For theprofile that supporting the bit depth smaller than or equal to 12 bits(e.g. the Main 10, Main 12, Monochrome 12, Main 4:4:4/4:2:2 10/12profile in HEVC), the modification related to shift4 can be removed.

In another embodiment, the left shift and right shift parameter in theinterpolation filtering process need to consider the bit depth of thecurrent picture. For example, when directly fetching the integer pixelwithout performing the interpolation, the left shift (shift3) can bemodified from Max(2, 14−BitDepth) to Max(2, 14−RefBitDepth). Also, forgradient calculation in prediction refinement with optical flow (PROF),the reference samples are directly used. If the bit depth of currentpicture and reference picture are different, the shift value should bemodified. For example, it can be modified from Max(2, 14−BitDepth) toMax(2, 14−RefBitDepth). When doing the interpolation, for theright-shift in the first stage interpolation filter, such as shift1, itcan consider the bit depth of the reference picture. The shift1 can bemodified from Min(4, BitDepth−8) to Min(4, RefBitDepth−8). The proposedtext for the scaling window sizes is as follows and highlighted inItalic.

8.5.6.3.2 Luma Sample Interpolation Filtering Process

Output of this process is a predicted luma sample valuepredSampleLX_(L). The variables shift1, shift2 and shift3 are derived asfollows:

-   -   Let the RefBitDepth be the BitDepth of the reference picture.    -   The variable shift1 is set equal to Min(4, RefBitDepth−8), the        variable shift2 is set equal to 6 and the variable shift3 is set        equal to Max(2, 14−RefBitDepth).    -   The variable picW is set equal to pic_width_in_luma_samples and        the variable picH is set equal to pic_height_in_luma_samples.

8.5.6.3.3 Luma Integer Sample Fetching Process

Inputs to this process are:

-   -   a luma location in full-sample units (xInt_(L), yInt_(L)),    -   the luma reference sample array refPicLX_(L),

Output of this process is a predicted luma sample valuepredSampleLX_(L).

Let the RefBitDepth be the BitDepth of the reference picture.

The variable shift is set equal to Max(2, 14−RefBitDepth).

8.5.6.3.4 Chroma Sample Interpolation Process

Output of this process is a predicted chroma sample valuepredSampleLX_(C). The variables shift1, shift2 and shift3 are derived asfollows:

-   -   Let the RefBitDepth be the BitDepth of the reference picture.    -   The variable shift1 is set equal to Min(4, RefBitDepth−8), the        variable shift2 is set equal to 6 and the variable shift3 is set        equal to Max(2, 14−RefBitDepth).    -   The variable picW_(C) is set equal to        pic_width_in_luma_samples/SubWidthC and the variable picH_(C) is        set equal to pic_height_in_luma_samples/SubHeightC.

In the above mentioned method, the bit depth or bit shift in weightedsample prediction process uses the bit depth of the current pictureinstead of the reference picture.

In another embodiment, all the input reference samples will do the bitdepth extension or bit depth reduction to match the bit depth of thecurrent picture. When performing the bit depth reduction, the directlytrication or rounding can be used. When doing the bit depth extension,the n bits of zeros (e.g. 2 bits of ‘00’) can be added after the LSB. Inanother example, when performing the bit depth extension, the n bits ofones (e.g. 2 bits of ‘11’) can be added after the LSB. In anotherexample, when performing the bit depth extension, the one bit of 1 andn−1 bits of 0 (e.g. 2 bits of ‘11’, 4 bits of ‘1000’) can be added afterthe LSB. In another example, when performing the bit depth extension,the n bit of signaled or predefined or derived bits can be added afterthe LSB.

The abovementioned methods can be combined and applied in all or inpart.

Any of the foregoing proposed methods can be implemented in encodersand/or decoders. For example, any of the proposed methods can beimplemented in a scaling or motion compensation module or parameterdetermining module of an encoder, and/or a scaling or motioncompensation module or parameter determining module of a decoder.Alternatively, any of the proposed methods can be implemented as acircuit coupled to the scaling or motion compensation module orparameter determining module of the encoder and/or the scaling or motioncompensation module or parameter determining module of the decoder, soas to provide the information needed by the scaling or motioncompensation module or parameter determining module.

Video encoders should follow the foregoing syntax design so as togenerate the conforming bitstream, and video decoders are able to decodethe bitstream correctly only if the parsing process is complied with theforegoing syntax design. When the syntax elements are skipped in thebitstream, encoders and decoders should set the values of the skippedsyntax elements as the inferred values to ensure no mismatch between theencoding and decoding results.

FIG. 1 illustrates an exemplary block diagram of a system incorporatingconstrained layer-wise video coding according to an embodiment of thepresent invention, where both the bit depth values and the chroma formatindex values for the current layer and the reference layer are the same.The steps shown in the flowchart, as well as other following flowchartsin this disclosure, may be implemented as program codes executable onone or more processors (e.g., one or more CPUs) at the encoder sideand/or the decoder side. The steps shown in the flowchart may also beimplemented based hardware such as one or more electronic devices orprocessors arranged to perform the steps in the flowchart. According tothis method, a bitstream is generated at an encoder side or received ata decoder side in step 110, where the bitstream corresponds to codeddata of current video data in a current layer. The bitstream complieswith a bitstream conformance requirement corresponding to both the bitdepth values for the current layer and the reference layer being thesame and the chroma format index values for the current layer and thereference layer being the same. The current video data in the currentlayer is then encoded or decoded by utilizing reference video data inthe reference layer in step 120.

FIG. 2 illustrates an exemplary block diagram of a system incorporatingconstrained layer-wise video coding according to an embodiment of thepresent invention, where the bit-depth is for the current layer isgreater than or equal to the bit-depth for a reference layer, or thechroma format index for the current layer is greater than or equal tothe chroma format index for the reference layer. According to thismethod, a bitstream is generated at an encoder side or received at adecoder side in step 210, where the bitstream corresponds to coded dataof current video data in a current layer. The bitstream complies with abitstream conformance requirement comprising a first condition, a secondcondition, or both the first condition and the second condition. Thefirst condition corresponds to that first bit-depth for the currentlayer is greater than or equal to second bit-depth for a referencelayer. The second condition corresponds to that first chroma formatindex for the current layer is greater than or equal to second chromaformat index for the reference layer, and the first chroma format indexand the second chroma format index specify chroma sampling relative toluma sampling for the current layer and the reference layerrespectively. A larger chroma format index value indicates a highersampling density. In step 220, the current video data in the currentlayer is then encoded or decoded by utilizing reference video data inthe reference layer along with the first bit-depth and the secondbit-depth, the first chroma format index and the second chroma formatindex, or both the first bit-depth and the second bit-depth and thefirst chroma format index and the second chroma format index.

FIG. 3 illustrates an exemplary block diagram of a system incorporatingconstrained layer-wise video coding according to an embodiment of thepresent invention, where the motion compensation utilizes informationcomprising chroma formats or one or more variables related to the chromaformat for both the current layer and the reference layer, or bit-depthvalues for both the current layer and the reference layer. According tothis method, input data are received in step 310, where the input datacorrespond to video data in a current layer at a video encoder side orthe input data correspond to coded video data in the current layer at avideo decoder side. Motion compensation is applied to the video data inthe current layer at the encoder side or to the coded video data in thecurrent layer at the video decoder side by utilizing reference videodata in a reference layer in step 320. The motion compensation utilizesinformation comprising chroma formats or one or more variables relatedto the chroma format for both the current layer and the reference layer,bit-depth values for both the current layer and the reference layer, orboth the chroma formats or said one or more variables related to thechroma format for both the current layer and the reference layer and thebit-depth values for both the current layer and the reference layer.

The flowcharts shown above are intended for serving as examples toillustrate embodiments of the present invention. A person skilled in theart may practice the present invention by modifying individual steps,splitting or combining steps with departing from the spirit of thepresent invention.

The above description is presented to enable a person of ordinary skillin the art to practice the present invention as provided in the contextof a particular application and its requirement. Various modificationsto the described embodiments will be apparent to those with skill in theart, and the general principles defined herein may be applied to otherembodiments. Therefore, the present invention is not intended to belimited to the particular embodiments shown and described, but is to beaccorded the widest scope consistent with the principles and novelfeatures herein disclosed. In the above detailed description, variousspecific details are illustrated in order to provide a thoroughunderstanding of the present invention. Nevertheless, it will beunderstood by those skilled in the art that the present invention may bepracticed.

Embodiment of the present invention as described above may beimplemented in various hardware, software codes, or a combination ofboth. For example, an embodiment of the present invention can be one ormore electronic circuits integrated into a video compression chip orprogram code integrated into video compression software to perform theprocessing described herein. An embodiment of the present invention mayalso be program code to be executed on a Digital Signal Processor (DSP)to perform the processing described herein. The invention may alsoinvolve a number of functions to be performed by a computer processor, adigital signal processor, a microprocessor, or field programmable gatearray (FPGA). These processors can be configured to perform particulartasks according to the invention, by executing machine-readable softwarecode or firmware code that defines the particular methods embodied bythe invention. The software code or firmware code may be developed indifferent programming languages and different formats or styles. Thesoftware code may also be compiled for different target platforms.However, different code formats, styles and languages of software codesand other means of configuring code to perform the tasks in accordancewith the invention will not depart from the spirit and scope of theinvention.

The invention may be embodied in other specific forms without departingfrom its spirit or essential characteristics. The described examples areto be considered in all respects only as illustrative and notrestrictive. The scope of the invention is therefore, indicated by theappended claims rather than by the foregoing description. All changeswhich come within the meaning and range of equivalency of the claims areto be embraced within their scope.

1. A method for video coding, wherein a multi-layer prediction mode is supported, the method comprising: generating, at an encoder side, or receiving, at a decoder side, a bitstream corresponding to coded data of current video data in a current layer, wherein the bitstream complies with a bitstream conformance requirement comprising a first condition, a second condition, or both the first condition and the second condition, wherein the first condition corresponds to that first bit-depth for the current layer is greater than or equal to second bit-depth for a reference layer, wherein the second condition corresponds to that first chroma format index for the current layer is greater than or equal to second chroma format index for the reference layer, and wherein the first chroma format index and the second chroma format index specify chroma sampling relative to luma sampling for the current layer and the reference layer respectively, and a larger chroma format index value indicates a higher sub-sampling density; and encoding, at the encoder side, or decoding at the decoder side, the current video data in the current layer by utilizing reference video data in the reference layer along with the first bit-depth and the second bit-depth, the first chroma format index and the second chroma format index, or both the first bit-depth and the second bit-depth and the first chroma format index and the second chroma format index.
 2. The method of claim 1, wherein the bit depth for the current layer and the reference layer are derived from bit_depth_minus8 syntax elements in the bitstream, and wherein the bit_depth_minus8 syntax elements correspond to the bit_depth_minus 8 for the current layer and the reference layer respectively.
 3. The method of claim 1, wherein the chroma format index for the current layer and the reference layer are determined according to chroma_format_idc syntax elements in the bitstream, and wherein the chroma_format_idc syntax elements specify chroma sampling relative to luma sampling for the current layer and the reference layer respectively.
 4. A method for coding a video sequence, wherein a multi-layer prediction mode is supported, the method comprising: receiving input data, wherein the input data correspond to video data in a current layer at a video encoder side or the input data correspond to coded video data in the current layer at a video decoder side; and applying motion compensation to the video data in the current layer at the encoder side or to the coded video data in the current layer at the video decoder side by utilizing reference video data in a reference layer, wherein the motion compensation utilizes information comprising chroma formats or one or more variables related to the chroma format for both the current layer and the reference layer, bit-depth values for both the current layer and the reference layer, or both the chroma formats or said one or more variables related to the chroma format for both the current layer and the reference layer and the bit-depth values for both the current layer and the reference layer.
 5. The method of claim 4, wherein the bit-depth values are derived from syntax elements signaled in a bitstream or parsed from the bitstream, and wherein syntax elements correspond to the bit-depth values minus 8 for the current layer and the reference layer respectively.
 6. The method of claim 4, wherein the chroma formats or said one or more variables comprise a first variable corresponding to horizontal chroma sub-sampling factor, a second variable corresponding to vertical chroma sub-sampling factor, or both the first variable and the second variable.
 7. The method of claim 6, wherein a reference sample position of the reference layer is calculated in a motion compensation process by utilizing the horizontal chroma sub-sampling factor, the vertical chroma sub-sampling factor, or both.
 8. The method of claim 6, wherein a reference sample position of the reference layer is calculated in a motion compensation process by utilizing a first ratio of a first horizontal chroma sub-sampling factor of the current layer to a second horizontal chroma sub-sampling factor of the reference layer, a second ratio of a first vertical chroma sub-sampling factor of the current layer to a second vertical chroma sub-sampling factor of the reference layer, or both. 